Align tiny-Glm4MoeForCausalLM with GLM-4.5 reference config#5638
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 540502a8d3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 49d5fca. Configure here.

What does this PR do?
On top of #5637
before:
after
Before submitting
AI writing disclosure
We welcome the use of AI tools to help with contributions. For transparency and to help us improve our review process, please indicate the level of AI involvement in this PR.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
Note
Low Risk
Low risk: only updates the tiny-model generation script’s
Glm4MoeConfigconstants (no runtime/library code changes), but could affect downstream consumers expecting the previous tiny config.Overview
Updates the GLM-4.5 tiny-model generation script to stop deriving
vocab_sizefrom the tokenizer and instead hardcode it, while adding several missing GLM-4.5-aligned config fields (e.g.,moe_intermediate_size,head_dim, attention/eos/pad IDs, RoPE theta, scaling, QK norm, and next-token prediction layers).This makes the generated tiny checkpoint’s config closer to the upstream reference, reducing config diffs when running
print_config_diffand pushing the tiny model to the hub.Reviewed by Cursor Bugbot for commit 1961d87. Bugbot is set up for automated code reviews on this repo. Configure here.